414333 - Özgür Polat
417121 - Hüseyin Can Minareci
With more and more consumers abandoning their credit card programs, a manager at the bank is concerned. They would really appreciate if one could foresee who would be churned for them so that they can proactively go to the consumer and offer better value to them and turn the decisions of customers in the opposite direction. Thus, they aggregated this dataset and the source we acquired it received it from https://leaps.analyttica.com/
Here in this project we will try to enlight the big picture a bit more with the capabilities we gained thanks to the Advanced Visualization in R course in Faculty of Economical Sciences at the University of Warsaw.
It is the best to start with understanding the variables we have and their definitions.
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2 v purrr 0.3.4
## v tibble 3.0.4 v dplyr 1.0.2
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(gridExtra)
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
library(cowplot)
library(ggforce)
library(GGally)
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
library(ggpubr)
##
## Attaching package: 'ggpubr'
## The following object is masked from 'package:cowplot':
##
## get_legend
churn <- read.csv("Data/churn.csv", sep = ',')
churnN <- read.csv("Data/churn.csv", sep = ',', na.strings = c("NA", "N/A", "Unknown"))
colSums(churn[,c(6,7,8)]=="Unknown")
## Education_Level Marital_Status Income_Category
## 1519 749 1112
## I think we should remove those Unknowns
churnNN <- drop_na(churnN)
prop.table(table(churn$Attrition_Flag))
##
## Attrited Customer Existing Customer
## 0.1606596 0.8393404
prop.table(table(churnNN$Attrition_Flag))
##
## Attrited Customer Existing Customer
## 0.1571812 0.8428188
After dropping Unknowns we are having very similar distribution and I would say lets drop it in order to have better EDA
ggplot(churnNN, aes(x = Credit_Limit, fill = Education_Level)) +
geom_histogram(data = churn[,-6], fill = "grey", alpha = .5) +
geom_histogram(colour = "black") +
facet_wrap(~ Education_Level) +
guides(fill = FALSE)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
unique(churnNN$Education_Level)
## [1] "High School" "Graduate" "Uneducated" "College"
## [5] "Post-Graduate" "Doctorate"
my_comp <- list( c("Uneducated", "High School"), c("High School", "College"), c("College", "Graduate"), c("Graduate", "Post-Graduate"), c("Post-Graduate", "Doctorate") )
ggviolin(churnNN, x = "Education_Level", y = "Total_Revolving_Bal",
fill = "Education_Level", palette = "jco",
add = "boxplot", add.params = list(fill = "white")) +
stat_compare_means(method = 'anova') +
stat_compare_means(comparisons = my_comp)
library(heatmaply)
## Loading required package: plotly
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
## Loading required package: viridis
## Loading required package: viridisLite
##
## ======================
## Welcome to heatmaply version 1.1.1
##
## Type citation('heatmaply') for how to cite the package.
## Type ?heatmaply for the main documentation.
##
## The github page is: https://github.com/talgalili/heatmaply/
## Please submit your suggestions and bug-reports at: https://github.com/talgalili/heatmaply/issues
## Or contact: <tal.galili@gmail.com>
## ======================
library(plotly)
library(ggcorrplot)
churn_numeric <- select_if(churnNN, is.numeric)
churn_ready_for_corr <- churn_numeric %>%
select(1:15)
# Compute correlation coefficients
corr <- churn_ready_for_corr %>%
cor()
# Compute correlation p-values
cor.test.p <- function(x){
FUN <- function(x, y) cor.test(x, y)[["p.value"]]
z <- outer(
colnames(x),
colnames(x),
Vectorize(function(i,j) FUN(x[,i], x[,j]))
)
dimnames(z) <- list(colnames(x), colnames(x))
z
}
p <- cor.test.p(churn_ready_for_corr)
# Create the heatmap
heatmaply_cor(
corr,
node_type = "scatter",
point_size_mat = -log10(p),
point_size_name = "-log10(p-value)",
label_names = c("x", "y", "Correlation")
)